Antipattern: Filling in the Corners

Let’s explore the issues that arise when using the Pseudo Key Neat-Freak antipattern for filling up the gaps.

There are two ways we might fill the perceived gap.

Assigning numbers out of sequence#

Instead of allocating a new primary key value using the automatic pseudo key mechanism, we might want to make any new row use the first unused primary key value. This way, as we insert data, we naturally make gaps fill in.

bug_id status product_name
1 OPEN Open RoundFile
2 FIXED ReConsider
4 OPEN ReConsider
3 NEW Visual TurboBuilder

However, we have to run an unnecessary self-join query to find the lowest unused value:

Finding the lowest unused value using JOIN

There were four values inserted in the Bugs table, so when we generated the next number, we got 5 in the output.

Earlier in the course, we looked at a concurrency issue when we try to allocate a unique primary key value by running a query such as SELECT MAX(bug_id)+1 1 FROM Bugs.

Special scope for sequences

This has the same flaw as two applications trying to find the lowest unused value at the same time: as both try to use the same value as a primary key value, one succeeds, and the other gets an error. This method is both inefficient and prone to errors.

Renumbering existing rows#

We may find it more urgent to make the primary key values contiguous and that waiting for new rows to fill in the gaps won’t fix the issue quickly enough. We may have to think of a strategy to update the key values of existing rows to eliminate gaps and make all the values contiguous.

Retrieving data from Bugs table

This usually means that we would find the row with the highest primary key value and update it with the lowest unused value. For example, we could update the value 4 to 3:

Updating primary key of a record in the Bugs table

Let’s retrieve the data in the following playground to see the effects of this query on the database.

Retrieving data from Bugs table after updating primary key of a record
bug_id status product_name
1 NEW Open RoundFile
2 FIXED ReConsider
3 DUPLICATE ReConsider

To accomplish this, we need to find an unused key value using a method similar to the previous one for inserting new rows. We also need to run the UPDATE statement to reassign the primary key value. Either one of these steps is susceptible to concurrency issues. We need to repeat the steps many times to fill a wide gap in the numbers.

We must also propagate the changed value to all child records that reference the rows we renumber. This is easiest if we declared foreign keys with the ON UPDATE CASCADE option, but if we didn’t, we would have to disable constraints, update all child records manually, and then restore the constraints. This is a laborious, error-prone process that can interrupt service in our database.

Even if we do accomplish this cleanup, it’s short-lived. When a pseudo key generates a new value, the value is greater than the last value it generated (even if the row with that value has since been deleted or changed), not the highest value currently in the table, as some database programmers assume. Let’s suppose we update the row with the greatest bug_id value 4 to the lower unused value to fill a gap. The next row we insert using the default pseudo key generator will allocate 5, leaving a new gap at 4.

Manufacturing data discrepancies#

Mitch Ratcliffe said, “The computer lets you make more mistakes faster than any other human invention in human history… with the possible exception of handguns and tequila.”

The story at the beginning of this chapter describes some hazards of renumbering primary key values. If another system external to our database depends on identifying rows by their primary keys, then our updates invalidate the data references in that system.

It’s not a good idea to reuse the row’s primary key value because a gap could be the result of deleting or rolling back a row for a good reason. For example, let’s suppose that a user with the account_id 789 is barred from our system for sending offensive emails. Our policies require us to delete the offender’s account, but if we recycle primary keys, we would subsequently assign 789 to another user. Since some offensive emails are still waiting to be read by some recipients, we could get further complaints about account 789. Though innocent, the poor user who has been reassigned that number will catch the blame.

Hence, it’s best to not reallocate pseudo key values just because they seem to be unused.

Synopsis: Pseudo Key Neat-Freak
Solution: Strictly Follow Referencing Rules
Mark as Completed
Report an Issue